Using Pivots to Speed-Up k-Medoids Clustering

نویسندگان

  • Adriano Arantes Paterlini
  • Mario A. Nascimento
  • Caetano Traina
چکیده

Clustering is a key technique within the KDD process, with k-means, and the more general k-medoids, being well-known incremental partition-based clustering algorithms. A fundamental issue within this class of algorithms is to find an initial set of medians (or medoids) that improves the efficiency of the algorithms (e.g., accelerating its convergence to a solution), at the same time that it improves its effectiveness (e.g., finding more meaningful clusters). Thus, in this article we aim at providing a technique that, given a set of elements, quickly finds a very small number of elements as medoid candidates for this set, allowing to improve both the efficiency and effectiveness of existing clustering algorithms. We target the class of k-medoids algorithms in general, and propose a technique that selects a well-positioned subset of central elements to serve as the initial set of medoids for the clustering process. Our technique leads to a substantially smaller amount of distance calculations, thus improving the algorithm’s efficiency when compared to existing methods, without sacrificing effectiveness. A salient feature of our proposed technique is that it is not a new k-medoid clustering algorithm per se, rather, it can be used in conjunction with any existing clustering algorithm that is based on the k-medoid paradigm. Experimental results, using both synthetic and real datasets, confirm the efficiency, effectiveness and scalability of the proposed technique.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Spatial Clustering with Obstacles Constraints Based on PNPSO and K-Medoids

In this paper, we propose a novel Spatial Clustering with Obstacles Constraints (SCOC) based on Dynamic Piecewise Linear Chaotic Map and Dynamic Nonlinear Particle Swarm Optimization (PNPSO) and K-Medoids, which is called PNPKSCOC. The contrastive experiments show that PNPKSCOC is effective and has better practicalities, and it performs better than PSO K-Medoids SCOC in terms of quantization er...

متن کامل

Analysis and Implementation of Modified K-medoids Algorithm to Increase Scalability and Efficiency for Large Dataset

Clustering plays a vital role in research area in the field of data mining. Clustering is a process of partitioning a set of data in a meaningful sub classes called clusters. It helps users to understand the natural grouping of cluster from the data set. It is unsupervised classification that means it has no predefined classes. Applications of cluster analysis are Economic Science, Document cla...

متن کامل

Clustering of Amino Acid Sequences Based on K-Medoids Method

【Abstract】We describe a new approach to clustering of amino acid sequences using K-Medoids Method. This method combines K-Medoids method, Dynamic Programming and other new theories in Biology. Experiments have proved that our method can get satisfying results. We believe that the method we proposed in this paper is a powerful and flexible tool for clustering of amino acid sequences. 【Keywords】C...

متن کامل

A K-means-like Algorithm for K-medoids Clustering

Clustering analysis is a descriptive task that seeks to identify homogeneous groups of objects based on the values of their attributes. This paper proposes a new algorithm for K-medoids clustering which runs like the K-means algorithm and tests several methods for selecting initial medoids. The proposed algorithm calculates the distance matrix once and uses it for finding new medoids at every i...

متن کامل

Colour image segmentation using K – Medoids Clustering

K – medoids clustering is used as a tool for clustering color space based on the distance criterion. This paper presents a color image segmentation method which divides colour space into clusters. Through this paper, using various colour images, we will try to prove that K – Medoids converges to approximate the optimal solution based on this criteria theoretically as well as experimentally. Her...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JIDM

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2011